`bin_df_cols` leave input df unchanged #192

janosh · 2024-08-07T20:25:45Z

No description provided.

DanielYang59 · 2024-10-06T10:19:27Z

pymatviz/process_data.py

@@ -21,7 +21,7 @@ def count_elements(
    values: ElemValues,
    count_mode: ElemCountMode = ElemCountMode.composition,
    exclude_elements: Sequence[str] = (),
-    fill_value: float | None = 0,
+    fill_value: float | None = None,


@janosh This seem to cause the behaviour change in #215, is this an accidental change? I think the original default value of 0 is correct here? i.e. using 0 for element this is missing?

pymatviz/pymatviz/process_data.py

Line 55 in 82c087b

fill_value (float | None): Value to fill in for missing elements. Defaults to 0.

thanks for looking into this! 🙏 i think returning None for missing elements in count_elements is more correct/semantic than 0. so ideally, ptable_heatmap_ratio should be updated to work with the new behavior

i think returning None for missing elements in count_elements is more correct/semantic than 0

I guess both choices hold merit from certain standpoint, i.e. None is more semantic to suggest that particular element is missing, 0 is also valid as its count is indeed zero.

For the particular function, however, I slight prefer using zero because it seems to be designed to "count element occurrence":

pymatviz/pymatviz/process_data.py

Lines 26 to 28 in 82c087b

"""Count element occurrence in list of formula strings or dict-like compositions.

If passed values are already a map from element symbol to counts, ensure the

data is a pd.Series filled with zero values for missing element symbols.

So its occurrence would be zero if that particular element doesn't appear in the data.

ideally, ptable_heatmap_ratio should be updated to work with the new behavior

I'm afraid this would have broader impact as there're quite a few usage that need to be checked:

For example:

from pymatviz import count_elements from pymatviz.enums import ElemCountMode, Key from matminer.datasets import load_dataset dataset = load_dataset("matbench_expt_gap")[Key.composition] counted_dataset = count_elements(dataset, ElemCountMode.composition, fill_value=0) print(counted_dataset)

With fill value zero:

symbol H 235.00 He 0.00 Li 967.66 Be 121.00 B 1168.00 ... Fl 0.00 Mc 0.00 Lv 0.00 Ts 0.00 Og 0.00

With None:

symbol H 235.00 He NaN Li 967.66 Be 121.00 B 1168.00 ... Fl NaN Mc NaN Lv NaN Ts NaN Og NaN

Using None seem to require an additional process step .

So do we want to change this completely or revert this?

janosh added 3 commits August 6, 2024 14:14

count_elements default fill_value=0->None

d13e58f

ptable_heatmap_plotly default fig title pos to (x=0.4, y=0.95)

b6f1954

fix bin_df_cols modifying input df, now tested to leave input unchanged

0a4711c

janosh added enhancement Improvement to existing features/functionality ux User experience labels Aug 7, 2024

janosh temporarily deployed to github-pages August 7, 2024 20:28 — with GitHub Actions Inactive

pin scipy<1.14 since requires python 3.10

69f317d

janosh temporarily deployed to github-pages August 7, 2024 20:40 — with GitHub Actions Inactive

janosh enabled auto-merge (squash) August 7, 2024 20:43

janosh merged commit 0d8ec52 into main Aug 7, 2024
7 checks passed

janosh deleted the bin-df-cols-leave-input-df-unchanged branch August 7, 2024 20:43

DanielYang59 reviewed Oct 6, 2024

View reviewed changes

DanielYang59 mentioned this pull request Oct 8, 2024

[Breaking] Change default fill_value of count_elements from 0 to None #226

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`bin_df_cols` leave input df unchanged #192

`bin_df_cols` leave input df unchanged #192

janosh commented Aug 7, 2024

DanielYang59 Oct 6, 2024 •

edited

Loading

janosh Oct 6, 2024

DanielYang59 Oct 6, 2024 •

edited

Loading

	"""Count element occurrence in list of formula strings or dict-like compositions.
	If passed values are already a map from element symbol to counts, ensure the
	data is a pd.Series filled with zero values for missing element symbols.

bin_df_cols leave input df unchanged #192

bin_df_cols leave input df unchanged #192

Conversation

janosh commented Aug 7, 2024

DanielYang59 Oct 6, 2024 • edited Loading

Choose a reason for hiding this comment

janosh Oct 6, 2024

Choose a reason for hiding this comment

DanielYang59 Oct 6, 2024 • edited Loading

Choose a reason for hiding this comment

`bin_df_cols` leave input df unchanged #192

`bin_df_cols` leave input df unchanged #192

DanielYang59 Oct 6, 2024 •

edited

Loading

DanielYang59 Oct 6, 2024 •

edited

Loading